I start by loading the packages and the data.
library(dplyr)
library(tidyr)
library(ggplot2)
library(plotly)
cData = read.csv("GlobalLandTemperaturesByState.csv")
Then I choose the United States from the data set, I remove Hawaii and Alaska, and I separate the date into Year, Month and Day.
cData %>%
filter(Country=="United States") %>%
separate(col = dt, into = c("Year", "Month", "Day"), convert = TRUE) ->cData
cData<-na.omit(cData)
cData %>%
filter(State!="Hawaii" & State!="Alaska") -> cData1
# Remove na's
cData1 = na.omit(cData1)
To calculate the average temperature for each year, I group by by Year and then find the average tempearature. I limit the analysis to years after 1850 so that the number of states do not change over time.
cData1 %>%
filter(Year>1850) %>%
group_by(Year) %>%
summarise(Temp = mean(AverageTemperature)) ->cData2
Now I can create a plot of the average temperature of the US for the years 1850-2013. It looks like the average temperature has risen over time.
g<-qplot(Year, Temp, data=cData2, main="U.S. Average Temperature 1850-2013",geom=c("point","smooth"))+ aes(colour = Temp) + scale_color_gradient(low="blue", high="red") + labs(y="Temperature")
ggplotly(g)
I can also look at a boxplot of Average Temperatures every 40 years.
cData1 %>%
filter(Year==1850 | Year==1890 | Year==1930 | Year==1970 | Year==2013) %>%
group_by(State,Year) %>%
summarise(Temp = mean(AverageTemperature)) ->cData3
cData3$Year <- as.factor(cData3$Year)
bp <- ggplot(cData3, aes(x=Year, y=Temp, fill="")) +
geom_boxplot()+
labs( x="",y = "Temperature")+ theme_classic()+scale_fill_brewer()+ guides(fill=FALSE)
plot(bp)
An anova table is useful to see if there are any differences in the Average Temperatures in the US.
c=aov(Temp~Year, data=cData3)
summary(c)
## Df Sum Sq Mean Sq F value Pr(>F)
## Year 4 218 54.52 2.807 0.0264 *
## Residuals 240 4662 19.43
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
print(model.tables(c,"means"),digits=3)
## Tables of means
## Grand mean
##
## 11.33633
##
## Year
## Year
## 1850 1890 1930 1970 2013
## 10.51 10.93 11.12 10.94 13.18
TukeyHSD(c)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Temp ~ Year, data = cData3)
##
## $Year
## diff lwr upr p adj
## 1890-1850 0.415243197 -2.0323195 2.862806 0.9902477
## 1930-1850 0.607991497 -1.8395712 3.055554 0.9600409
## 1970-1850 0.420858844 -2.0267039 2.868422 0.9897377
## 2013-1850 2.666743197 0.2191805 5.114306 0.0250385
## 1930-1890 0.192748299 -2.2548144 2.640311 0.9995095
## 1970-1890 0.005615646 -2.4419471 2.453178 1.0000000
## 2013-1890 2.251500000 -0.1960627 4.699063 0.0877744
## 1970-1930 -0.187132653 -2.6346954 2.260430 0.9995637
## 2013-1930 2.058751701 -0.3888110 4.506314 0.1446209
## 2013-1970 2.245884354 -0.2016784 4.693447 0.0891290
The pvalues highlight differences in the Average Temperatures at the 10% level between the years 2013 vs 1970, 2013 vs 1890 and 2013 vs 1850. However, if the end year was chosen to 2011 or 2010 this effect is likely to disapear.
Finally, I organize the data to get a data frame for a state choropleth and print the maps for 1850 and 2013.
# Changing Georgia (State)
cData$State <- as.character(cData$State)
cData$State[cData$State=="Georgia (State)"] <- "Georgia"
cData$State<- as.factor(cData$State)
#' select columns of interest
cData %>%
select(Year,AverageTemperature,State) %>%
group_by(Year,State) %>%
summarise(value=mean(AverageTemperature))-> cData4
#Data frame must have a column named region (all lower case) and another one value.
colnames(cData4)[2]<- "region"
cData4$region<-tolower(cData4$region)
cData4 %>%
filter(Year==1850) -> cData1850
cData1850<-cData1850[,2:3]
cData4 %>%
filter(Year==2013) -> cData2013
cData2013<-cData2013[,2:3]
#Loading Packages
library(choroplethr)
library(choroplethrMaps)
print(state_choropleth(cData1850,
title="Land Temperature 1850",
num_colors = 0,
legend="Degrees"),reference_map=TRUE)
print(state_choropleth(cData2013,
title="Land Temperature 2013",
num_colors = 0,
legend="Degrees"),reference_map=TRUE)
Seems like there are darker shades of purple in the 2013 us map. Alaska has clearly experienced a higher land temperature in 2013 relative to 1850.